Whole genome sequencing identifies associations for nonsyndromic sagittal craniosynostosis with the intergenic region of BMP2 and noncoding RNA gene LINC01428

Craniosynostosis (CS) is a major birth defect resulting from premature fusion of cranial sutures. Nonsyndromic CS occurs more frequently than syndromic CS, with sagittal nonsyndromic craniosynostosis (sNCS) presenting as the most common CS phenotype. Previous genome-wide association and targeted sequencing analyses of sNCS have identified multiple associated loci, with the strongest association on chromosome 20. Herein, we report the first whole-genome sequencing study of sNCS using 63 proband-parent trios. Sequencing data for these trios were analyzed using the transmission disequilibrium test (TDT) and rare variant TDT (rvTDT) to identify high-risk rare gene variants. Sequencing data were also examined for copy number variants (CNVs) and de novo variants. TDT analysis identified a highly significant locus at 20p12.3, localized to the intergenic region between BMP2 and the noncoding RNA gene LINC01428. Three variants (rs6054763, rs6054764, rs932517) were identified as potential causal variants due to their probability of being transcription factor binding sites, deleterious combined annotation dependent depletion scores, and high minor allele enrichment in probands. Morphometric analysis of cranial vault shape in an unaffected cohort validated the effect of these three single nucleotide variants (SNVs) on dolichocephaly. No genome-wide significant rare variants, de novo loci, or CNVs were identified. Future efforts to identify risk variants for sNCS should include sequencing of larger and more diverse population samples and increased omics analyses, such as RNA-seq and ATAC-seq.


Study population
A total of 100 probands diagnosed with sNCS and their unaffected parents were selected for this study from the cohort of ~ 1500 families with NCS that were recruited through the collaborative efforts of investigators in the International Craniosynostosis Consortium (https:// health.ucdav is.edu/ pedia trics/ resea rch/ labs/ boyd-genet ics-lab/ crani ostudy).This sample included 23 trios with variants significantly associated with sNCS in our previous GWAS.The study was approved by the Institutional Review Board of the University of California Davis and the corresponding entities at each collaborating site.The study protocol was conducted in accordance with the guidelines required by these approvals, including obtaining signed informed consents from all study participants prior to review of medical documentation, clinical examinations, and specimen collection.Evaluation of probands and their parents was conducted via clinical examination by a clinical geneticist; no proband was found to have additional major birth defects or developmental delays, and no parent was identified to have a major birth defect.DNA was extracted from blood, saliva, or mouthwash according to the manufacturer's protocol using the Gentra Puregene Blood Kit (QIAGEN).All proband specimens were tested and were negative for variants in the hot spots of FGFR1, FGFR2, FGFR3 and the entire TWIST1 gene, as described elsewhere 25 .DNA from a total of 63 proband-parent trios successfully underwent WGS as described below; 7 of these trios included additional related individuals with sNCS.

TDT analysis
Sequencing data for the 63 families were analyzed using TDT in PLINK 28 , with testing being performed between each variant and sNCS.Seven of the families had sib-pairs; a proband-parent trio was extracted from those families by selecting the three individuals with the highest genotype call rates that formed the complete trio.In total, 189 individuals (63 trios) were analyzed by TDT.Two stratified analyses were also performed.The first stratified analysis focused on families with only one affected child; thus, the 21 individuals (7 trios) that were extracted from the multiplex families were removed, leaving 168 individuals (56 trios).The second stratified analysis was to examine results in families not included in our original published GWAS to serve as an internal replication for our WGS analysis; thus, the 69 individuals (23 trios) that were used in our original GWAS were removed, leaving 40 trios with 120 individuals.

Rare variant TDT analysis
To leverage RVs in our WGS dataset, gene-based versions of the TDT (ran through rvTDT 24 ) were used to capture RVs that might be underpowered in our traditional TDT analysis.The rvTDT uses collapsing methods (including combined multivariate collapsing [CMC] and variable threshold [VT] methods) to create a gene-based marker that corresponds to a gene or intergenic region for TDT analysis as a single marker.RVs were defined as those in our dataset with a minor allele frequency (MAF) ≤ 0.02 in the non-Finnish European gnomAD population.

Variant annotation
Variants were annotated using wANNOVAR 30 .Of interest were potential RVs in coding regions, with rare being defined as a MAF < 1% in gnomAD non-Finnish European database, as these variants would not be analyzed under the traditional TDT analysis.Noncoding variants were annotated using RegulomeDB 31 with the goal of determining potential transcription factor binding sites.CADD scores [32][33][34] and ClinVar data 35 were used in annotation.

Effect of sNCS risk variants on cranial vault shape
The effects of variants nominated through WGS on normal-range human cranial vault shape were interrogated.These data were generated by leveraging imaging (whole head magnetic resonance [MR] scans) and SNV array (Affymetrix NIDA Smokescreen™) on 6772 individuals available through the Adolescent Brain Cognitive Development (ABCD) study cohort (Data release 3.0) 36 .The working hypothesis was that variants impacting sNCS risk will also lead to a tendency toward a dolichocephalic vault shape in the general population.The hypothesis was tested by examining the effects of sNCS risk variants separately and as haplotypes (when possible) on measures of 3D vault shape based on a recently completed GWAS (Goovaerts et al. 37 ).This GWAS contains a detailed description of the methods used to derive the 3D vault phenotypes from MR scans and the subsequent investigation of SNV effects on vault shape.

Analytical sample
The final cleaned dataset contained 33,470,718 SNVs and indels in 63 sNCS families, comprised 196 individuals (63 probands, 7 affected sibs, and 126 unaffected parents).All individuals were of non-Finnish European ancestry collected from the United States, Australia, Bulgaria, and Italy.Our initial TDT analysis removed the 7 affected sibs leaving 63 probands (109 males; 57% male) and 126 unaffected parents for analysis.
The analytical sample used to identify associations for rs6054748 and rs1884302 included some of the trios from our previous GWAS 10 .These trios were selected for sequencing based on carrying this variant to allow us to examine a much more granular view of these loci and the potential for identifying the true causal variant.A closer look at the GWS variants on chromosome 20 shows a small, linked haplotype containing these variants localized to a 30,000 bp region flanked by rs1159531 (7,122,948 bp) to rs1401362781 (7,144,987 bp) (Fig. 2).To ensure that the GWS variants identified on chromosome 20 were not being driven by the seven multiplex families from which we had extracted a proband-parent trio, we reran association tests excluding these trios.Even with the slight reduction in statistical power expected due to decreased sample size, 10 variants remained GWS (Fig. 3) These 10 variants were a subset of the original 24 GWS variants from the 63 trios.No other GWS variants were observed.The two variants with the highest significance remained rs1884302 and rs6054748 (p-values = 9.36e − 9; ORs = 5.7) (Table 2).
We also wanted to examine how much of the association signal at this locus was driven by the 40 trios included in our WGS that were not included in our prior GWAS.Among this reduced sample set, no variants were identified as GWS, although three variants in complete linkage disequilibrium (LD) rs6054761, rs6054763, and rs932517 showed the strongest signal with identical p-values and ORs (p = 1.54e − 5, OR = 5.5).Only two variants on chromosome 17 near KCNJ2 showed lower p-values (Table 3).

rvTDT
The rvTDT analysis on our 63 trios did not yield any GWS results at a GWS threshold of 1e-5.The region with the highest significance was observed between LL22NC01-81G9.3 and TEX33 on chromosome 22 (p-value = 3.5e − 5),  www.nature.com/scientificreports/and the gene with the highest significance was HOXB13 on chromosome 17 (p-value = 2e − 4) (Supplemental Table 1).
Although our GWS locus on chromosome 20 was driven by common variants, we explored whether these variants might be tagging rare causal variants; however, the intergenic region between BMP2 and LINC01428 was not significant (p-value = 0.99).As such, it is likely that the association signal at this locus is driven solely by common variation.

CNV and de novo analysis
No GWS CNVs (<5 e − 8) were identified, and no significant de novo variants were found.

Causal variant identification
No GWS variants identified in our TDT analysis were in coding regions, making identification of potential causal variants more challenging.One approach for determining causality for noncoding variants is to cross-reference association results with functional annotations from reference datasets, including variants known to affect RNA expression (eQTLs), splicing (sQTLs), or protein expression (pQTLs).No variant in our identified haplotype was found to be a known QTL of any kind.Another approach is determining if a variant is located in a likely transcription factor binding site (TFBS).Three variants (rs6054763, rs6054764, rs932517) that are clustered close together with less than 800 bp among them had a probability of 55-61% of being a TFBS.Chromatin immunoprecipitation (ChIP) data from RegulomeDB revealed potential binding for multiple TFs for each of these variants (Table 4).TBX21 and MEF2B are shared between rs6054764 and rs932517 and FOS is shared between rs6054763 and rs6054764.www.nature.com/scientificreports/CADD scores can be used to predict the potential deleteriousness of non-coding variants, as they incorporate additional annotations besides protein prediction, such as conservation, DNase hypersensitivity, and splicing sites into their scores [32][33][34] (Table 1).The three variants with the highest CADD scores were the same three variants with the highest TFBS binding scores, rs6054763 (CADD score 6.75), rs6054764 (CADD score 9.99), and rs932517 (CADD score 4.07).Additionally, the minor allele frequency of each variant was highly enriched in our sNCS cohort compared to the general population, with rs6054764 enriched by 16% and rs6054763 and rs932517 enriched by 11% compared to gnomAD non-Finnish Europeans (Table 5).Further, as shown by the results of our analysis removing the trios used in our original GWAS, these SNVs were the most significant along the haplotype in the 40 new trios.Taking the TFBS, CADD, and allele enrichment evidence into account, these three closely clustered variants within this 800 bp region seem to be the most promising candidates for causality.Linkage disequilibrium (LD analysis brings the total haplotype to about 1100 bp; the entire set of GWS variants lie on a haplotype of 14,893 bp.

Haplotypes of sNCS risk variants impact cranial vault shape in the general population
With variation near BMP2 being implicated in both sNCS and normal-range variation in cranial vault shape, we aimed to investigate how the locus comprising sNCS risk variants rs6054763, rs932517, and rs6054764 affected cranial vault shape in the general population based on a recent GWAS (Goovaerts et al. 37 ) in a multi-ancestry cohort of 6772 adolescents.Each variant was GWS and presented dolichocephalic tendency for the minor allele with very similar overall shape effects across the variants due to high LD (Supplemental Fig. 1).
To further elucidate how variation at this locus relates to cranial vault shape variation in the general population, we investigated how haplotypes of sNCS risk variants rs6054763, rs932517, and rs6054764 affect cranial vault shape.Among ABCD participants with recent European ancestry (N = 5746), rs6054763 (G/C), rs932517 (A/G), and rs6054764 (T/C) constitute three distinct haplotypes: CGC, GAT, and GAC with haplotype frequencies of 28%, 68%, and 4% respectively, closely matching those in the European populations of the 1000 Genomes Phase 3 dataset (28%, 67%, and 5%).ABCD participants were stratified according to diplotype (i.e. the haplotype pair on homologous chromosomes; six possible combinations) and for each group we extracted 3D cranial vault shape from T1-weighted MR images as previously described (Goovaerts et al. 37 ).After adjustment for covariates (age, sex, height, weight, cranial size, 10 genetic ancestry principal components, and the model of MR scanner used), the average cranial vault shape of each group was compared with that of the homozygous GAT carriers as the reference group because this haplotype was devoid of any sNCS risk alleles.Furthermore, cranial index was measured from adjusted 3D cranial vault shape for each individual as the ratio of maximum cranial width and maximum cranial length.For the CGC and GAC haplotypes, association with cranial index was assessed by a two-sample t-test (two-tailed), testing the null hypothesis that carriers of at least one index haplotype share the same average cranial index as homozygous GAT carriers.
Figure 4 illustrates how the narrowing and elongation of the cranial vault is most pronounced in the homozygous GAC carriers, with the C allele for rs6054764 being the only risk allele on the haplotype.Additionally, cranial index (CI), defined as the ratio of maximal cranial width over maximal cranial length, was significantly lower in carriers of at least one GAC haplotype versus the references (p-value: 0.0252).No significant difference in CI was observed between the reference group and carriers of at least one CGC haplotype, which notably contains all three risk alleles (p-value: 0.3446).

Discussion
We previously found a signal for sNCS at 20p12.3 in the intergenic space between BMP2 and LINC01428 in using a GWAS 10 and a targeted sequencing study 38 .These findings motivated our current use of WGS data for 63 affected trios, which allowed for additional granularity to identify potential causal variants in this region.Overall, we identified 24 variants (all with MAF ≥ 0.39) as GWS within the intergenic region; the only variants that were GWS in our entire dataset.Applying an rvTDT analysis did not show the RVs in this region to be significant.This suggests that these common variants themselves are likely, in part, to be causal for the observed locus.It is possible that no single variant is solely causal for the phenotype; instead, the effect may be additive.
Identifying which of these variants were potentially causal within the small haplotype was challenging.rvTDT analysis confirmed that the locus does not appear to be driven by a RV with a large effect, as these would have been captured in set-based tests.Nor does the locus appear to be driven by de novo variants, as no de novo variant in the region was shared by more than two trios.Finally, copy number repeat variation (CNV) analysis failed to identify any associated CNVs and no GWS variants were found to be known eQTLs or pQTLs.
With these findings, our best evidence for approximating causality is around SNVs rs6054763, rs6054764, and rs932517.These variants are predicted to be TFBS and predicted damaging by CADD.Upon running stratified analyses, these variants remained significant after the multiplex families were removed but were not GWS after removing the 23 previously analyzed GWAS trios (although no variant was), which is reasonable after having removed so many trios from a small dataset.However, these variants had the third lowest p-values overall and the lowest along the haplotype, suggesting that the 40 trios not included in our original GWAS are the primary drivers of this locus and provide an internal replication of our GWAS findings for BMP2.
The variant rs6054764 is particularly interesting, as it is predicted to be the most deleterious by CADD and is the most highly enriched minor allele, with the minor allele present in 48% of the dataset, compared to 32% in the European population.This variant is also associated with another bone related trait, heel bone mineral density 39 .
In an effort to further identify putative functional noncoding regions, we analyzed ChIP databases.ChIP evidence gives multiple potential TFs as binders, but the most interesting is CTCF, which was identified by ChIP in multiple organs/tissue, including bone.CTCF is particularly intriguing as our group had previous identified a link between CTCF and mNCS 11 .In that study, the most significantly associated SNV, rs6127972 at the BMP7 locus, was found to reside in a linkage disequilibrium block that overlapped a CTCF TFBS; that binding site was significantly hypomethylated in mesenchymal stem cells derived from fused metopic compared to open sutures 11 .Reduced expression of CTCF itself has been found to cause craniofacial malformations in mice 40 , and further CTCF binding sites have been localized near other genes that direct craniofacial development 41 .Other TFs identified by ChIP at this 800 bp region also have interesting biological evidence.FOS, for which both www.nature.com/scientificreports/rs6054764 and rs6054763 are TFBS is known to be increased in the FGF2 pathway that stimulates premature cranial suture fusion 42 .RAD51 is a known risk gene for orofacial clefts 43 , and mutations in EP300 are responsible for Rubinstein-Taybi syndrome, which causes craniofacial defects 44 .
It is also possible that this region is modulating the expression of either BMP2 or LINC01428.There has been little published regarding the function of the lncRNA gene.lncRNAs have been found to play a role in NCS; the lncRNA HOTAIR, which has previous associations with osteogenesis and osteoarthritis, was recently found to be dysregulated in NCS and results in impaired osteoclast differentiation 45 .Further, the master bone transcription factor RUNX2 is modulated by lncRNAs, though not the particular lncRNA associated in our study 46 .Conversely, it is likely that our three variants of interest are not solely responsible for the locus; with the most probable explanation being some sort of additive effect caused by multiple potential causal variants in the region.
An interesting item of note is that we did not see a significant variant at BBS9 on chromosome 7, which had been previously observed in our original GWAS 10,38 .The association is still present, but at a less significant p-value (lowest p-value = 9e − 6) than we previously observed, although effect size remained consistent.This may be due to our WGS dataset being much smaller than our GWAS and targeted sequencing datasets in which we previously identified the BBS9 association.Additionally, the WGS trios were selected for sequencing based on the chromosome 20 (BMP2) association, which was always stronger, and not the chromosome 7 (BBS9) association.To combat potential bias of the previously analyzed families, we performed the additional stratified analysis removing these families.
We further found that our three putative associated variants near BMP2 (alone and in combination) also impacted normal-range human cranial vault shape.Specifically, we found that these variants were associated with a more dolichocephalic shape, echoing the more severe phenotype observed in sNCS and validating our initial hypothesis.These results suggest that normal-range and dysmorphic variation are genetically linked and may be best conceptualized as a continuum.Because our putative variants are relatively common, we expect them to be present in a sizable portion of unaffected individuals.The dolichocephalic tendency we observed suggests an effect on the sagittal suture.A key factor may be timing.Due to rapid brain growth early in life, pre-or peri-natal suture fusion leads to the kinds of severe phenotypic outcomes we typically associate with craniosynostosis.If this fusion is delayed, however, the child may show more subtle changes in vault shape, with few or no other sequalae, and likely within the normal range of variation.This could be considered a subclinical manifestation of craniosynostosis.Two independent lines of evidence support this claim.First, a recent report found that previously undocumented fusion of the sagittal suture is present on CT scans in about 5% of children, none of whom exhibited clinical symptoms of sNSC 47 .Second, this type of "delayed-onset" craniosynostosis has been reported in a congenital rabbit model and shows an intermediate phenotype between early onset and unaffected animals 48 .Similar subclinical manifestations have been reported in orofacial clefting 49 .Thus, this kind of phenotypic variability should be expected in complex and multifactorial traits, where the genotype-tophenotype correlation is heavily modulated.
In summary, we identified a highly significant association in the intergenic region between BMP2 and LINC01428.Identification of potential causal variants proved challenging, as the region was noncoding, no RV association was detected, and none of the significant variants were QTLs.By using CADD scores and TFBS probabilities, we propose that the most likely causal variants are rs6054763, rs6054764, and rs932517, which are predicted deleterious, highly enriched in probands, and probably TFBS.Because of the challenges in attributing potential causality to noncoding variants such as these, future work on these families and larger, more diverse population samples could focus on other genomic analysis, such as copy number variation and other types of omics data, particularly RNA-seq or ATAC-seq analysis, which would allow for accurate quantification of RNA expression levels and chromatin accessibility regions.

Figure 1 .
Figure 1.Genome-wide Manhattan plot of full transmission disequilibrium test analysis for 63 sagittal nonsyndromic craniosynostosis trios.The lines at 7.3 and 5 represent the respective thresholds for genome-wide significant and genome-wide suggestive associations.

Figure 2 .
Figure 2. Zoomed in plot of the genome-wide significant region at 20p12.3.Manhattan plot zoomed in on the chromosome 20 linkage signal.rs1884302 (not shown) is in complete linkage disequilibrium with rs6054748 (shown) within the haplotype containing these variants (red circle).LOC101929265 is an alias for LINC01428.

Figure 3 .
Figure 3. Genome-Wide Manhattan Plot of Transmission Disequilibrium Test Analysis with Multiplex Families Removed (56 remaining Sagittal Nonsyndromic Craniosynostosis Trios).The lines at 7.3 and 5 represent the respective thresholds for genome-wide significant and genome-wide suggestive associations.

Figure 4 .
Figure 4. Diplotypes at sagittal nonsyndromic craniosynostosis risk locus near BMP2 are associated with normal cranial vault shape.Overtransmitted alleles are indicated in bold in the three haplotypes comprising rs6054763, rs932517, and rs6054764 respectively.For each diploid combination of haplotypes, the average cranial vault shape is compared to the average shape of the homozygous GAT carriers who serve as reference.Red and blue indicate outward protrusion and inward depression respectively, with grey indicating no difference.The colorscale applies to the whole figure.Average and standard deviation of cranial index (CI) are indicated for all diploid groups.N indicates the sample size.

Table 1 .
Genome-wide significant variants from transmission disequilibrium test analysis.

Table 3 .
Top five significant variants from transmission disequilibrium test analysis with trios included in our previous genome-wide association study removed.Headers represent -CHR = chromosome, SNP = SNP rsID, Pos = position in base pair, Min allele = minor allele, Maj allele = major allele, OR = odds ratio, P = p-value.

Table 5 .
Differences in minor allele frequency between dataset cohort and general population in candidate variants.Headers represent-SNP = SNP rsID, Dataset MAF = Dataset minor allele frequency, gnomAD NFE MAF = Genome Aggregation Database non-Finnish Europeans minor allele frequency.